Rule Induction for Click-Stream Analysis: Set Covering and Compositional Approach

نویسندگان

  • Petr Berka
  • Vladimír Las
  • Tomas Kocka
چکیده

We present a set covering algorithm and a compositional algorithm to describe sequences of www pages visits in click-stream data. The set covering algorithm utilizes the approach of rule specialization like the well known CN2 algorithm, the compositional algorithm is based on our original KEX algorithm, however both algorithms deal with sequences of events (visited pages) instead of sets of attribute-value pairs. The learned rules can be used to predict next page to be viewed by a user or to describe the most typical paths of www pages visitors and the dependencies among the www pages. We have successfully used both algorithms on real data from an internet shop and we mined useful information from the data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Use of Robust Factor Analysis of Compositional Geochemical Data for the Recognition of the Target Area in Khusf 1:100000 Sheet, South Khorasan, Iran

The closed nature of geochemical data has been proven in many studies. Compositional data have special properties that mean that standard statistical methods cannot be used to analyse them. These data imply a particular geometry called Aitchison geometry in the simplex space. For analysis, the dataset must first be opened by the various transformations provided. One of the most popular of the a...

متن کامل

Rule learning for classification based on neighborhood covering reduction

Rough set theory has been extensively discussed in the domain of machine learning and data mining. Pawlak’s rough set theory offers a formal theoretical framework for attribute reduction and rule learning from nominal data. However, this model is not applicable to numerical data, which widely exist in real-world applications. In this work, we extend this framework to numerical feature spaces by...

متن کامل

A Local Branching Approach for the Set Covering Problem

The set covering problem (SCP) is a well-known combinatorial optimization problem. This paper investigates development of a local branching approach for the SCP. This solution strategy is exact in nature, though it is designed to improve the heuristic behavior of the mixed integer programming solver. The algorithm parameters are tuned by design of experiments approach. The proposed method is te...

متن کامل

Discovering New Rule Induction Algorithms with Grammar-based Genetic Programming

Rule induction is a data mining technique used to extract classification rules of the form IF (conditions) THEN (predicted class) from data. The majority of the rule induction algorithms found in the literature follow the sequential covering strategy, which essentially induces one rule at a time until (almost) all the training data is covered by the induced rule set. This strategy describes a b...

متن کامل

Application of continuous restricted Boltzmann machine to detect multivariate anomalies from stream sediment geochemical data, Korit, East of Iran

Anomaly separation using stream sediment geochemical data has an essential role in regional exploration. Many different techniques have been proposed to distinguish anomalous from study area. In this research, a continuous restricted Boltzmann machine (CRBM), which is a generative stochastic artificial neural network, was used to recognize the mineral potential area in Korit 1:100000 sheet, loc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005